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Abstract — As location-based applications become ubiquitous 
in emerging wireless networks, Location Verification Systems 
(LVS) are of growing importance. In this paper we propose, 
for the first time, a rigorous information-theoretic framework 
for an LVS. The theoretical framework we develop illustrates 
how the threshold used in the detection of a spoofed location 
can be optimized in terms of the mutual information between 
the input and output data of the LVS. In order to verify 
the legitimacy of our analytical framework we have carried 
out detailed numerical simulations. Our simulations mimic the 
practical scenario where a system deployed using our framework 
must make a binary Yes/No "malicious decision" to each snapshot 
of the signal strength values obtained by base stations. The 
comparison between simulation and analysis shows excellent 
agreement. Our optimized LVS framework provides a defence 
against location spoofing attacks in emerging wireless networks 
such as those envisioned for Intelligent Transport Systems, where 
verification of location information is of paramount importance. 



I. Introduction 

As Location-Based Services become widely deployed, the 
importance of verifying the location information being fed into 
the location service is becoming a critical security issue. The 
main difference between a Location Verification System (LVS) 
and a localization system is that we are confronted by some 
a priori information, such as a claimed position in the LVS 
|[T]-|(2)- In the context of a main target application of our 
system, namely Intelligent Transport Systems (ITS), the issue 
of location verification has attracted a considerable amount of 
recent attention (8|-| 13 1. Normally, in order to infer whether 
a network user or node is malicious (attempting to spoof 
location) or legitimate (actually at the claimed location), we 
have to set a threshold for the LVS. This threshold is set so 
as to obtain low false positive rates for legitimate users and 
high detection rates for malicious users. As such, the specific 
value of the threshold will directly affect the performance of 
an LVS. 

One traditional approach to set the threshold of an LVS is to 
search for a tradeoff between false positive rate and detection 
rate according to receiver operating characteristic (ROC) curve 
JT4) . Another technique is to obtain the false positive and 
detection rates through empirical training data and minimize 
specific functions of the two rates to set the threshold (2j Q 
||6). For example, in [4], the sum of false positive and false 
negative rates were minimized. However, although successful 



in many scenarios, the approaches mentioned above do not 
specify in any formal sense what the 'optimal' threshold value 
of an LVS should be. In addition, in our key target application 
of our LVS, namely ITS, it is not practical to collect the 
required training data due to the variable circumstances. 

The main point of this paper is to develop for the first 
time an information theoretic framework that will allow us 
to formally set the optimal threshold of an LVS. In order 
to do this, we first define a threshold based on the squared 
Mahalanobis distance, which utilizes the Fisher Information 
Matrix (FIM) associated with the location information metrics 
utilized by the LVS. To optimize the threshold, the Intrusion 
Detection Capability (IDC) proposed by Gu et al. fl4) for an 
Intrusion Detection System (IDS) will be utilized. The IDC is 
the ratio of the reduction of uncertainty of the IDS input given 
the output. As such, the IDC measures the capability of an 
IDS to classify the input events correctly. A larger IDC means 
that the LVS has an improved capability of classifying users 
as malicious or legitimate accurately. From an information 
theoretic point of view the optimal threshold is the value that 
maximizes the IDC. 

The rest of this paper is organized as follows. Section 2 
presents the system model, which details the observation 
model and the threat model we utilize. In section 3, the 
threshold is defined in terms of the FIM associated with the 
location metrics. Section 3 also provides the techniques used 
to determine the false positive and detection rates, which are 
utilized to derive the IDC. Section 4 provides the details 
of how the IDC is used in the optimizing the threshold. 
Simulation results which validate our new analytical LVS 
framework are presented in Section 5. Section 6 concludes 
and discusses some future directions. 

II. System model 

A. A Priori Information: Claimed Position 

Let us assume a user could obtain its true position, 6 t = 
[xt,yt], from its localization equipment (i.e., GPS), and that 
the localization error is zero. Thus, a legitimate user's claimed 
(reported) position, 6 C = [x c ,y c ], is exactly the same as 
its true position 9 t . However, a malicious user will falsify 
(spoof) its claimed position in an attempt to fool the LVS . We 
denote the legitimate and malicious hypothesis as Hq and Hi, 



respectively, and the a priori information can be summarized 
as 



between Pj and Pj. This divergence can be defined by the 
Mean Square Error (MSE) as follows 



{H : 6 C = 9 t , (Legitimate) 
Hi : 9 c ^9 t , (Malicious). 

B. Observation Model based on Hq 



(1) 



Although the framework we develop can be built on any 
location information metric, for purposes of illustration in this 
work we will solely investigate the case where the location 
information metric is the Received Signal Strength (RSS) 
obtained by a Base Station (BS) from a user. The RSS of 
the i-th BS from a legitimate user, Pj, is assumed to be given 
by 



P =P Q -10 7 lof 



10 



(do) 



(2) 



where Pq is a reference received power, do is the reference 
distance, 7 is the path loss exponent, w a is a zero-mean normal 
random variable with variance o 2 dB , the Euclidean distance of 
the i-th BS to the user's true position [xt,yt] is 



^(x t ~x B Y + (y t -y B f, z = 1,2, 



,N, 



where [x B ,y B ] is the location of the i-th BS, and N is the 
number of BSs. For H in eq. (|TJ, d\ in eq. ^ can be replaced 
by d\, where d\ is the Euclidean distance of the i-th BS to 
the user's claimed position [x c ,y c ] and can be expressed as 



+ (y c -y l B ) 2 , i = i,2, 



,N. 



C. Threat Model (Observation Model based on Hi) 

Let us assume a malicious user knows the positions of all 
BSs and is able to boost its transmit power according to its 
claimed positions. The RSS of the i-th BS from a malicious 
user, Pi, can be written as 



Pi — Pn + Prr 



10 7 log(-i) + w CT , 



(3) 



where P x is the boost power. We assume the malicious user is 
equipped with only one omni-antenna, and thus P x is constant 
for all the BSs. 

In the following, one strategy to set a boost value of 
P x for the malicious user will be provided. A malicious 
user's claimed position is determined by its purpose and 
LVS parameters. Constrained by the positions of all BSs, the 
spoofed observations Pj are not exactly the same as the ideal 
observations Pj calculated according to its claimed position as 
follows 



P = P -10 7 log 



{do) 



However, the malicious user would like to spoof the obser- 
vations Pi as similar as possible to the ideal observations p. 
Thus, it will set a value of P x to minimize the divergence 



V = 




~ 1 2 



10 7 log(^-) + 10 7 log 10 



where E is the expectation with respect to all the observations. 
Then, the value of P x can be expressed as P x = argminP. 

Taking the first derivative of T> with respect to P x and setting 
it to zero, we can obtain P x as 



Pr 



k=l 



fe=l u 



In the above we use k instead of i in the equations related to 
P x to avoid confusion them with the Ho observation model. 
Substituting P x into eq. ([3}, the threat model (observation 
model based on Hi) can be rewritten as 



JY 



P = P +P? - ~ V 10 7 log(^) + w a , 



(4) 



where 



^ = ~i>7iogA-io 7 iog(^). 

Eq. Q is the general threat model based on RSS, but it is not 
practical since a malicious user's true position is unknown. 
We can approximate the threat model by assuming 9 t follows 
a distribution. Here, due to the limited space, let us assume 
a malicious user has an approximate infinite distance away 
from all BSs to facilitate the LVS (the more general case is 
discussed later). Given this assumption, all the BSs distance's 
from the user converge to one value. That is, the distance of a 
malicious user's true position to every BS is nearly a constant 
number df ar , i.e., d\ = df ar , d\ = d/ ar , i,k = 1,2, ... ,N. 
Therefore, the term P* can be rewritten as 

1 N A A 

Pi - i£l0 7 log(^f ) - 10 7 log(^) = 0. 



k=l 



Based on the above analysis, the threat model can be expressed 



1 N d c 



(5) 



fe=i 



III. Threshold and Two Rates 



In this section, we first present our threshold based on 
the squared Mahalanobis distance, which utilizes the inverse 
FIM. Then, we provide techniques used to determine the false 
positive rate a and the detection rate f3 of our LVS. 



A. Threshold 



where 



The threshold is defined in terms of the squared Maha- 
lanobis distance of an estimated position vector 9 = [x,y]^ 
The squared Mahalanobis distance can be expressed as |l5j 

D M = (0-e)M- 1 (6-S) T , 

where 9 is the mean of 9 and M is the covariance matrix of 
9. According to the definition of Dm, it is a dimensionless 
scalar and involves not only the Euclidean distance but also 
the geometric information. In an LVS, we are interested in 
the 'distance' between a user's estimated position 9 and its 
claimed position 9 C . Thus, we will use 9 C instead of 9 to 
calculate Dm- In addition, without any a priori results from 
a localization algorithms, we can not obtain any estimate 
of the covariance matrix M. Therefore, we will utilize the 
inverse FIM, M c , to approximate M. With this, the squared 
Mahalanobis distance in our LVS can be written as 

D M = (e-e c )M-\e-e c f. 

where M c = F^ 1 and F is the FIM to be calculated as 
given below. In practice, the LVS works on the observation 
model based on Hq, and the likelihood function of received 
powers can be obtained using eq. |2]). Let us assume the 
observations received by different BSs are independent, then 
the log-likelihood function can be expressed as 



i(p\e t 



i 



N 
•> T 2 



P^-Po + lOylog^) 



logC. 



where P is the iV-dimension observation vector and the 
constant number C is 

1 



Then, we can calculate the terms of the FIM through 

~d 2 l(P\9 t y 



F xy — —E 



dxdy 



where E represents the expectation operation with respect to 
all observations. After some algebra, the FIM can be written 
as Ifl6l, 



N . 2 

b \ - sin 2tp, t 

i=l 



f/' 2 



2, 

i=l 

N o2 , 



> d t2 



d? 



'Note that an equivalent description of our LVS, which does not introduce 
the Mahabalotnis distance, can be described in terms of the Cramer-Rao 
Lower Bound crcR- I n this alternative description, an error ellipse is derived 
directly from the FIM, with the scale of the ellipse being set by acR an d 
the orientation being set by the eigenvectors of the inverse FIM. For different 
values of the threshold T the ellipse size scales as Tctqh, and the detection 
algorithm decides the user is malicious if the estimated position returned by 
the location MLE lies outside of the ellipse. 



b = 



IO7 



Vt ~ V l B 



ipi = arctan 

After setting a threshold parameter T for the squared 
Mahalanobis distance, the decision rule of an LVS (i.e. a 
malicious user or not) can be expressed as follows 



Dm < T, => Hq (Legitimate) 
Dm > T, =>■ Hi (Malicious). 



(6) 



Note that, we are able to transform any covariance matrix 
into a diagonal matrix by rotating the position vector flT) . 
Thus, the general form of M c can be expressed as 



Mr 











Then, the threshold T can be encapsulated within the equation 
for an ellipse as follows 



(x- x c ) 2 
Tal 



(y - Vc) 2 



= 1. 



Therefore, the threshold T can also be understood as an ellipse, 
denoted as T, which is determined by extending the error 
ellipse provided by the FIM with the threshold parameter T. 

Based on the above analysis, the overall process of an LVS 
includes four steps 

• Collect observations of the RSS received from a user by 
each BS; 

• Apply a localization algorithm to obtain an estimated 
position 9; 

• Calculate the squared Mahalanobis distance Dm of 9 to 
the user's claimed position 9 C ; 

• Infer if the user is legitimate or malicious according to 
the decision rule in eq. ([6]). 

In practice, the above are all the steps of our LVS. However, 
to evaluate an LVS, false positive and detection rates, which 
are functions of the threshold parameter T and other LVS 
parameters, are always investigated in theory. In the following 
subsections, we provide techniques used to determine false 
positive and detection rates in order to optimize the threshold 
parameter T. 

B. False Positive Rate 

The false positive rate a is the probability by which legiti- 
mate users are judged as malicious ones. For a legitimate user, 
9 C = 9 t . Then, in the 2-D physical space, the false positive 
rate can be expressed as a = e~ % D17I. 

In fact, the true positive rate (1 — a) is a well known 
metric that underlies the performance of unbiased localization 
algorithms. For example, in the 2-D physical space, it states 
that the probability by which an estimated position lies within 
the ellipse with T — 1 is no more than 39.35%. 




C. Detection Rate 

The detection rate /? is the probability that malicious users 
are recognized as malicious ones. In order to calculate j3, we 
have to obtain the posterior probability density function (pdf) 
for a location given some RSS observation vector, which can 
be expressed as 



MP) = 



f(P\0)f(d) 

W) ' 



where = [x, y] is a general location, and P = 
[Pi, P2, . . . , Pjv] is the observation vector. Of course, if the 
user is malicious the observed signal vector P will be one 
that has undergone a boost as described by eq. (|5j. Let us 
denote the average value of this spoofed observation vector as 
P. Given this, the likelihood function f(0\P) can be derived 
from eq. If we take to be a uniform variable vector, 
then the detection rate f3 can be calculated as 



P = 1 - / J f{0\P)dxdy 

[x,s/]eT 



1 



1 

M 



f(P\0)dxdy, 



[x,y]eT 



where A\ is a normalizing constant that can be written as 
Ai = f(P) = J J f(P\0)f(0)dxdy, 



where 



f(8\P) 



f(P\0)f(0) 
f(P) 



Numerical methods are utilized to solve the above integral 
equation for f3 since there is no closed form solution. Based 
on the above analysis, f3 is also a function of T. 

As an aside it is worth mentioning that the false positive 
rate a can also be written in a similar form as follows 



a = 1 - 



f(P\0)dxdy, 



where P is the average non-spoofed observation vector and 
A = f(P) = J J f(P\0)f(0)dxdy, 



where 



MP) = 



f(P\0)f(0) 



f(P) 

IV. Optimization of the Threshold 

In this section we will optimize the value of the threshold by 
maximizing the IDC, which is a function of the false positive 
rate a, detection rate j3 and the base rate B (the a priori 
probability of intrusion in the input event data). That is, our 
optimization procedure is to find the value of T that maximizes 
the IDC. From an information theoretic point of view, the IDC 
is a metric that measures the capability of an IDS to classify 
the input events correctly and is defined as p4) 



C 



IDC — 



I(X;Y) _ H(X)-H(X\Y) 



H(X) 



H(X) 



(7) 



[x,y]eT 



where H(X) is the entropy of the input data X, I{X;Y) is 
the mutual information of input data X and output data Y, 
and H(X\Y) is the conditional entropy. Mutual information 
I(X, Y) measures the reduction of uncertainty of the input X 
given the output Y, Thus, Cmc is me ra ti° of the reduction 
of uncertainty of the input given the output. Its value range 
is [0, 1]. A larger Cmc value means that the IDS has an 
improved capability of classifying input events accurately. 

Our LVS can be modeled as an IDS whose input data are the 
claimed positions, and the output data are the binary decisions. 
Then, X = represents an actual claimed position from a 
legitimate user, X — 1 represents a spoofed claimed position 
from a malicious user, Y — infers the user is legitimate, 
and Y = 1 indicates the user is malicious. Accordingly, the 
false positive rate a is the probability V{Y = l\X = 0), 
and detection rate (3 is the probability V(Y = l\X = 1). 




Therefore, the optimal value of T is the one that maximizes 
the value of the Cjdc of the LVS. 

The realizations of input and output data are denoted as z x 
and z y , respectively. Given the base rate B, the entropy of the 
input data H(X) can be written as | |T8) 

H{X) = -^p(z x )\ogp(z x ) 

= -B logB - (l-B)log(l-B). 

The conditional entropy H(X\Y) can be expressed as 

p(z x )p{z y \z x ) 



J2 J2 p{zx)p{z v \z x ) log yK " x ^y 



-BP log 



Bf3 



-5(1-/3) log 



B/3+ (1 - B)a 

B(l-(3) 



-(l-B)(l-a)log 
-(1 - B)alog 



S(l-j8) + (l-B)(l-a) 
(l-B)(l-a) 



(l-fl)(l-a)+B(l-^) 
(1 - B)a 



(1 - B)a + B(3 

Numerical methods are applied in order to search for the 
optimal value of T since there is no closed form for (3. In the 
following we refer to this optimal value as Tq. 

V. Simulation Result 

Adopting a Maximum Likelihood Estimator (MLE) in our 
location estimation algorithm we now verify, via detailed sim- 
ulations, our previous analysis. The theoretical and simulated 
a, (3 and Cidc, all of which are dependent on T, are utilized 
in order to find the value Tq that maximizes Cmc- 



A. Simulation Set-up 

The simulation settings are as follows: 

• N BSs are deployed in a 200m x 200m square field and 
the legitimate and honest users can communicate with all 
BSs; 

• The claimed positions of honest and malicious users are 
the same, denoted C ; 

• S observations are collected from each base station; 

• The BSs are set at fixed positions (we investigate a range 
of fixed locations); 

• The results shown are averaged over 1,000 Monte Carlo 
realizations of the estimated position, and where the base 
rate B — 50% for all the simulations. 

B. a, /3,Cjdc with Different Values ofT 

As shown in Fig.l, the solid lines are the theoretical a, 
(3 and Cmc while the symbols are the simulated a, (3 and 
Cidc- The simulated values of a and f3 are calculated directly 
according to the realizations of estimated positions, and then 
the simulated Cidc is obtained from eq. Q. The simulation 
parameters are shown in the figure caption and the theoretical 
optimal value To can be seen to be 4.75 (note that in all the 
figures explicitly shown in this paper the four BSs are fixed at 
the comers of a 200m x 200m grid). The comparison between 
simulation and analysis shows excellent agreement. Beyond 
the simulations explicitly shown in Fig.l, we have investigated 
a range of other fixed BSs positions (up to 10 BSs whose 
positions are randomly selected), and these simulation also 
show excellent agreement with simulations. Collectively, these 
simulation results verify the analysis we have provided earlier. 

The simulation results with a malicious user having a certain 
distance to all BSs are shown in Fig. 2. The true position of 
the malicious user in the simulations is set at 10km away from 
the claimed position. Although the simulation and theoretical 
values of a, (3 and Cmc do not match with each other exactly 
(the theoretical analysis approximates the user as being at 
infinity), the simulation and theoretical optimal values Tq are 



effectively the same. We find this result holds down to distance 
where the malicious user is a few km away from the claimed 
position. This shows that our framework is tenable when the 
assumption that malicious user is infinitely far away is relaxed 
down to the few km range. 

In order to verify the Cmc with the optimal value T is 
correct, we also simulated Cmc f° r a range of ct\ b . Fig. 3 
shows such results for the case where the user malicious user 
if effectively at infinity. Here the optimal value To is derived 
from the proposed theoretical analysis, but in the simulations 
the threshold is set to the other values of T shown (2Tq an d 
0.5T ). From the results shown we can see that these other 
values do provide simulated false positive and detection rates 
which result in lower values of Cidc ( an d therefore sub- 
optimal performance), which once again verifies the robustness 
of our analytical framework. Fig. 4 shows the same results 
except that the malicious user is again set at 10km away from 
the claimed position. Again we see a validation of our analysis. 

VI. Conclusion and Future Work 

In this paper, we have proposed a novel and rigorous 
information theoretic framework for an LVS. The theoretical 
framework we have developed shows how the value of the 
threshold used in the detection of a spoofed location can be 
optimized in terms of the mutual information between the 
input and output data. In order to verify the legitimacy of our 
framework we have carried out detailed numerical simulations 
of our framework under the assumption of an idealized threat 
model in which the malicious user is far enough from the 
claimed location such that his boosted signal strength results 
in all BSs receiving the same RSS (modulo noise). Our 
numerical simulations mimic the practical scenario where a 
system deployed using our framework must make a binary 
Yes/No "malicious decision" to each snapshot of RSS values 
obtained by the BSs. The comparison between simulation and 
analysis shows excellent agreement. Other simulations where 
we modify the approximation of constant RSS at BSs also 
showed very good agreement with analysis. 

The work described in this paper formalises the performance 
of an optimal LVS system under the simplest (and perhaps 
most likely scenario), where a single malicious user attempts 
to spoof his location to a wider wireless network. The practical 
scenario we had in mind whilst carrying out our simulations 
was in an ITS where another vehicle is attempting to provide 
falsified location information the wider vehicular network. 
Future work related our new framework will include the 
formal inclusion of more sophisticated threat models, where 
the malicious user is both closer to the claimed location and 
has the use of colluding adversaries. It is well known that 
no LVS can be made foolproof under the colluding adversary 
scenarioj^jhowever, we will investigate in a formal information 

2 Note that location verification in the context of quantum communications 
systems have previously been considered e.g. [19], [20;], [21], and it has been 
argued that such systems are able to securely verify a location under all 
known threat models [22] - although see [23] who argue otherwise. It is 
undisputed that classical communications alone cannot achieve secure location 
verification under all known threat models. 



theoretic sense the detailed nature of the vulnerability of an 
LVS under such different threat models. 
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